Distribution and diversity of ‘Tectomicrobia’, a deep-branching uncultivated bacterial lineage harboring rich producers of bioactive metabolites

Genomic and functional analyses of bacterial sponge symbionts belonging to the uncultivated candidate genus ‘Entotheonella’ has revealed them as the prolific producers of bioactive compounds previously identified from their invertebrate hosts. These studies also suggested ‘Entotheonella’ as the first members of a new candidate phylum, ‘Tectomicrobia’. Here we analyzed the phylogenetic structure and environmental distribution of this as-yet sparsely populated phylum-like lineage. The data show that ‘Entotheonella’ and other ‘Tectomicrobia’ are not restricted to marine habitats but widely distributed among terrestrial locations. The inferred phylogenetic trees suggest several intra-phylum lineages with diverse lifestyles. Of these, the previously described ‘Entotheonella’ lineage can be more accurately divided into at least three different candidate genera with the terrestrial ‘Candidatus Prasianella’, the largely terrestrial ‘Candidatus Allonella’, the ‘Candidatus Thalassonella’ comprising sponge-associated members, and the more widely distributed ‘Candidatus Entotheonella’. Genomic characterization of ‘Thalassonella’ members from a range of sponge hosts did not suggest a role as providers of natural products, despite high genomic similarity to ‘Entotheonella’ regarding primary metabolism and implied lifestyle. In contrast, the analysis revealed a correlation between the revised ‘Entotheonella’ 16S rRNA gene phylogeny and a specific association with sponges and their natural products. This feature might serve as a discovery method to accelerate the identification of new chemically rich ‘Entotheonella’ variants, and led to the identification of the first ‘Entotheonella’ symbiont in a non-tetractinellid sponge, Psammocinia sp., indicating a wide host distribution of ‘Entotheonella’-based chemical symbiosis.


INTRODUCTION
The bacterial tree of life contains numerous deep-branching lineages that lack cultivated representatives [1][2][3][4][5]. Data from 16S rRNA genes and whole environmental genomes, obtained by metagenomic binning or single-cell sequencing, support the existence of dozens of uncultivated phylum-like divisions that are distributed ubiquitously or in more specialized habitats. Among a large and growing list of examples are the Candidate Phyla Radiation (CPR, previously known as 'Patescibacteria') [3,[6][7][8], the SAR324 group detected in hydrothermal plumes [4,9], or 'Poribacteria' present in sponge microbiomes [10]. 'Omics' data suggest diverse and intriguing functions for such elusive taxa [11,12], but as experimental validation is usually challenging, verified functions remain limited.
In collaborative studies, we previously reported members of the candidate genus 'Entotheonella', first described by researchers at the Scripps Institution of Oceanography [13,14], as an uncultivated taxon with a remarkably rich specialized metabolism [15][16][17][18]. Genome data suggested 'Entotheonella' as members of a new candidate phylum, termed 'Tectomicrobia' [15], which is corroborated by a more recent reanalysis of the bacterial tree of life based on standardized classification criteria [3]. All sequenced 'Entotheonella' phylotypes with known morphology form multicellular filaments, have large genomes around 10 Mb, and colonize sponges with which they appear to form mutualistic associations involving chemical defense [15][16][17][18][19] and arsenic and heavy metal detoxification [12]. Marine sponges are prolific sources of bioactive natural products that may contribute to protecting the sessile animals against predators and epibionts [20]. In the demosponge Theonella swinhoei (order Tetractinellida, suborder Astrophorina, family Theonellidae), containing a particularly rich chemistry, bioinformatic and biochemical data attributed most of the known substances to the symbiont phylotypes 'Ca. Entotheonella factor' and 'Ca. Entotheonella serta' [15,16] (Table 1). These producers generate distinct sets of natural products and, in a mutually exclusive fashion, colonize two different host chemotypes, T. swinhoei Y and W, as members of microbiomes comprising numerous other bacteria [21][22][23], including the aurantoside producer 'Ca. Poriflexus aureus' (Chloroflexi) present in the Y chemotype [24]. In this sponge, 'E. factor' is also accompanied by another closely related variant, 'Ca. Entotheonella gemina' [15], while a further 'Entotheonella' phylotype is present in a T. swinhoei W sampled from the Red Sea. One or more additional 'Entotheonella' phylotypes were detected in the theonellid sponge Discodermia calyx and collectively linked to the production of three biosynthetically distinct compound families [19,25,26].
Besides biosynthetic gene clusters (BGCs) assigned to known sponge natural products, all 'Entotheonella' genomes sequenced to date ('E. serta', 'E. factor', and 'E. gemina') contain multiple biosynthetic loci for as-yet unknown compounds. Enzymatic studies on two of the unassigned BGCs suggest that the orphan clusters are likely functional and encode the biosynthesis of previously unknown metabolites [15,27,28]. Their remarkable metabolic capabilities classify 'Entotheonella' as the first uncultivated producer taxon with a chemical richness and variability comparable to important culturable drug discovery sources, such as filamentous Actinomycetota, Cyanobacteria, or Myxobacteria [29]. Their pharmacological potential, intriguing biology, and isolated phylogenetic position warrant further investigations into the distribution and functions of 'Tectomicrobia', which currently contain only four members in the Genome Taxonomy Database [3].
In this study, we sought to obtain insights into the phylogenetic structure and environmental distribution of 'Tectomicrobia', seeking to address the following three questions: Do further tectomicrobial taxa exist besides 'Entotheonella'? If so, what is their chemical potential? Is the association with sponges a general feature of this lineage or can free-living representatives be identified? Here, we present new sequence data, which along with re-analysis of publicly available sequences and corroborated by recent studies [30][31][32][33], show that 'Tectomicrobia' contains a much wider range of lineages that occur in diverse marine and terrestrial habitats. For one of these, the new sponge-associated candidate genus 'Thalassonella', we present eight metagenomeassembled genomes (MAGs). Within 'Entotheonella', we identified two distinct phylogenetic signatures that suggest the existence of generalist sponge colonizers as well as host-specific members, the latter including all known natural product-rich phylotypes. Based on these data, we tested whether a 16S rRNA gene-based prioritization strategy can pinpoint new 'Entotheonella' producers. This strategy resulted in the discovery of a new 'Entotheonella' variant in the dictyoceratid sponge Psammocinia sp., showing that small molecule-based symbiosis within 'Entotheonella' is not restricted to the demosponge order Tetractinellida.  [34] and preserved in 95% EtOH. Shallow water Geodia barretti (Demospongiae, Tetractinellida, Geodiidae) were collected by SCUBA diving in 2012 in the Bjørnsund Islands and Lysefjord, western Norway [35]. Other North Atlantic deep sea Geodia species (G. atlantica, G. barretti, G. hentscheli, G. macandrewii, G. parva, G. pachydermata, G. phlegraei) [36] were collected between 2004 and 2016 using remotely operating vehicles (ROVs) and trawls/dredges during various cruises in western Norwegian fjords, Kosterfjord (Sweden), Rockall Bank, the Greenland Sea, the Davis Strait, the Flemish Cap, Svalbard and the Galicia Bank. All specimens were preserved in 95% EtOH shortly after collection. DNA was extracted using the DNeasy Blood and Tissue Kit (Qiagen). Partial 16 S rRNA genes were amplified by a nested PCR approach using the bacterial primers 27 F (5'-AGA GTT TGA TCM TGG CTC AG-3') and 1492 R (5'-TAC GGY TAC CTT GTT ACG ACT T-3') in a first round of PCR and the 'Entotheonella'-specific primers Ento238F (5'-CCG GTC TGA GAT GAG CTT GC-3') [14] and Ento1442R (5'-TCA CCC CAA TCA CCC CGC-3') [15] in the second round of PCR. The resulting DNA fragments were either sequenced directly or subcloned into pJet1.2 using the CloneJet PCR Cloning Kit (Thermo Scientific) prior to Sanger sequencing. Specimens of G. barretti (Gb1-Gb2, Gb4-Gb10) and G. atlantica (Ga3) for metagenomic sequencing were collected by dredging onboard R/V Hans Brattström of the University of Bergen from Korsfjord, south of Bergen, Norway (60°8.13' N, 5°6.7' E) in September and October 2017 (Table S1). They were chopped and flash frozen with liquid nitrogen upon collection and stored at −80°C. Specimens (Gb126, Gb278 and Gb305) of G. barretti were collected during benthic trawls by the crews of the R/V Pâmiut of the Greenland Institute of Natural Resources during cruises conducted by Fisheries and Oceans Canada in the Davis Strait taken during the same season (Sept-Oct) from 2011-2015 and were stored at −20°C (Table S1) [37]. All G. barretti samples were crushed in liquid nitrogen to a fine powder with pestle and mortar. Two hundred mg of sponge tissue material were disrupted by bead beating using milling balls (5 × 2 mm + 2 × 5 mm) and 2 steps of shaking for 20 s at 4000 rpm as described in [38]. Tissue lysate was further used for DNA extraction with the AllPrep DNA/RNA/Protein Mini Kit (Qiagen) and sequenced by Novogene (Hong Kong) using the Illumina HiSeq PE150 platform.

MATERIALS AND METHODS Sponge collection and sequencing
Four sponges (Fascaplysinopsis sp. DOM10, Geodia sp. DOM14B, Aplysina sp. DOM33, Desmacella sp. DOM40) were collected by submersible near Portsmouth, Dominica (Table S1). Sponges were stored in RNALater (Thermo Fisher) at −20°C until further processing. Sponge pieces were defrosted, rinsed with sterile artificial seawater, chopped into small pieces, added to a PowerBead Tube and subjected to shaking three times for 60 s at 6000 × g using the Precellys® 24 Homogenizer (Bertin GmbH). DNA was subsequently extracted following the standard protocol of the DNeasy® PowerSoil® Pro Kit (Qiagen). Metagenomic DNA samples were sent to Novogene Europe (United Kingdom) for sequencing using the Illumina Novaseq6000 platform with the PE150 library and sequencing kits.

Soil sampling and sequencing
Soil samples were collected from 19 locations in Germany and Norway (Table S2). To isolate the metagenomic DNA of the samples, the PowerSoil® DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA, USA) was used. Partial 16 S rRNA genes were amplified by the same nested PCR approach detailed above.

Enrichment of soil bacteria
The collected soil was resuspended in 0.9% (w/v) aqueous NaCl solution using a hand blender. The suspension was stirred with a magnetic stirrer for 20 min then left undisturbed for additional 10 min to allow for settling of particles. The supernatant was decanted through a 32 µm Nytex mesh Isolation of filamentous bacteria from the sponge Psammocinia sp Filament-enriched bacterial cell fractions from Psammocinia sp. were prepared by a modified protocol previously described for Theonella sponges [15]. Briefly, the sponge tissue was cut into small pieces, immersed in buffered calcium-and magnesium-free artificial sea water (CMF-ASW: 10 mM Tris-Cl, 449 mM NaCl, 9 mM KCl, 33 mM Na 2 SO 4 , 2 mM NaHCO 3 ) and further processed using a mortar and pestle. The homogenized suspension was subsequently transferred to a Falcon tube and incubated with a mixture of collagenase I and IV (end concentration: 240 µg/ml) for 30 min at 37°C. After 10-fold dilution with CMF-ASW and addition of 2.5 mM ethylene glycol tetraacetic acid (EGTA), the solution was incubated on a rotating wheel at 4°C overnight. Next day, the solution was passed through a 32 µm Nytex mesh and centrifuged at 100 × g for 10 min to pellet filamentous bacteria. The pellet was washed 3 times with 500 µl CMF-ASW and stored at 4°C until use.

Phylogenetic analysis
Previously deposited tectomicrobial 16 S rRNA gene sequences were retrieved from the GenBank sequence database by BLASTing the fulllength 16 S rRNA gene sequence of 'Entotheonella factor' (KF926817). Hits above a threshold of 75% identity (>1000 sequences) were aligned using MUSCLE [39], followed by manual alignment correction. A total of 811 sequences were selected to create an alignment with maximal sequence length and without gaps in the region corresponding to positions 337 to 1078 in the E. coli homolog. To the dataset were added 187 sequences belonging to known species of Nitrospirota, Deltaproteobacteria (now reclassified into three phyla, Desulfobacterota, Myxococcota and SAR324), Alphaprotebacteria, Rokubacteria, Acidobacteriota and Nitrospinota as outgroups. After a first round of phylogenetic analysis using the Neighbor-Joining method [40], the putative tectomicrobial dataset was reduced from 811 to 447 sequences, with length ranging between 700 and 745 bp, that fulfilled the monophyly criterion. The initial phylogram was generated by the Neighbor-Joining method with 500 bootstrap resamplings using the Jukes-Cantor method to compute evolutionary distances. To further test for a placement within 'Tectomicrobia', the phylogeny was also reconstructed with the Maximum Likelihood algorithm with 500 bootstrap resamplings using the Generalized Time Reversible model [40,41] as suggested as the best substitution model by MEGA6. A discrete gamma distribution of 0.47 was used to model evolutionary rate differences among sites. The percentage of replicate trees in which the associated taxa clustered together in a bootstrap test of both algorithms are indicated in the final tree when either value was above 50%. All evolutionary analyses were conducted in MEGA6 [42] and visualized using the online tool iTOL [43]. A global identity matrix was used to calculate the median sequence identities (Hodges-Lehmann median) of all taxa, and a normal approximation was the basis for the upper and lower bounds for the 95% confidence interval.
A BLASTN search using the 16 S rRNA gene sequences from 'E. factor' and 'E. serta' as queries was used to detect 16 S rRNA gene sequences within the putative 'Tectomicrobia' bins. Of the 24 putative 'Tectomicrobia' bins, 22 had at least a fragmentary 16 S rRNA gene sequence, and 14 of those had at least 400 bp of overlap with the amplicon generated from our 'Entotheonella'-specific primers [15,16]. These sequences were aligned to the 811 used previously using MUSCLE, followed by trimming of the alignment to correct for different sequence lengths and manual realignment. The phylogenetic tree was regenerated for analysis using the Maximum Likelihood method as described above.

Phylogenomic and genomic analysis
Phylogenomic trees were constructed in Anvi'o [48] using the standard phylogenomics workflow laid out in the tutorial. Briefly, the putative 'Tectomicrobia' MAGs were imported into Anvi'o, genes were called with Prodigal [49], and an HMM profile was created. The protein sequences corresponding to 71 bacterial single-copy genes [50] were extracted, filtered to remove those present in less than 75% of MAGs, and concatenated; these concatenated sequences were then aligned with MUSCLE [39]. The alignment was first adjusted by manual trimming before being used to generate a phylogenomic tree using FastTree v2.1.11 [51] ( Fig. S8). BGCs were predicted using bacterial AntiSMASH 5.0 with 'relaxed' detection strictness [52]. The quality of the 'Thalassonella' MAGs was estimated with CheckM v1.1.2 [53]. Only medium-high-quality MAGs (>75% completeness, <5% contamination) were considered for downstream analysis. Genome annotation was performed with Prokka v1.14.6 [54]. Assignment of Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthologs (KO) and metabolic pathway reconstruction were conducted using the KofamKOALA [55] and BlastKOALA [56] online tools with default settings. For annotation with BlastKOALA, the 'species_prokaryotes' KEGG GENES database was used.

RESULTS AND DISCUSSION
Analysis of environmental samples for the presence of 'Tectomicrobia' Although our previous analyses had focused on 'Tectomicrobia' living in symbiosis with astrophorin sponges, preliminary E.E. Peters et al.
phylogenetic analyses also suggested the existence of this candidate phylum in other sponges and habitats. In an attempt to uncover further tectomicrobial diversity, we engaged in a targeted sequence prospecting campaign. For this purpose, we collected soil from 19 locations in Germany and Norway (Table S2) and isolated the DNA of enriched bacterial fractions prepared from these samples. To analyze phylogenetic divergence in these samples we generated 16 S rRNA gene sequences by a nested PCR approach previously established for the identification of 'Entotheonella' in metagenomic DNA samples [15] or by the use of the more general 16 S rRNA gene PCR primers, 27 F and 1492 R, on enriched bacterial cell pellets. In both cases, target amplicons were obtained and subcloned for sequencing, resulting in a combined 45 new near full length tectomicrobial 16 S rRNA gene sequences. A similar procedure was carried out on Geodia sponges collected from the North Atlantic as well as previously collected theonellid sponges from Japan and Israel [15,18], generating an additional 35 new tectomicrobial 16 S rRNA sequences, for a total of 80 (Table S3).

Phylogenetic analyses suggest various marine and terrestrial genus-like lineages in 'Tectomicrobia'
For the phylogenetic analyses, public databases were searched for additional potential tectomicrobial 16 S rRNA gene sequences to further enrich the dataset. In total, 447 sequences remained that were used in this study, consisting of 363 sequences derived from GenBank and our 84 newly acquired sequences. None of these originated from cultivated bacteria. A previous preliminary phylogenetic analysis of 'Tectomicrobia' had suggested the presence of three distinct clades, with the 'Entotheonella' lineage harboring most of the sequences [15]. Using our new, expanded dataset, phylograms inferred by maximum likelihood and neighbor-joining methods instead showed two deep-branching clades with high bootstrap support (Figs. 1A and S1-S3). Clade 1 comprised 119 sequences with low sequence divergence (94.3% median sequence identity, MSI) (Fig. 1B), suggesting a single candidate genus, for which we propose the name 'Candidatus Allonella' (from Greek ἄλλος, other). Clade 2 was further divided into at least three major subgroups that were well-supported by bootstrap replicates. All sequences previously designated 'Entotheonella' [15] fell into one of these three subgroups, along with 11 of the sequences generated in our sequencing campaign. Subgroup 1, which retains the name 'Entotheonella', had an MSI of 94.7%, suggesting a taxonomic ranking of a candidate genus according to the taxonomic thresholds proposed by Yarza et al. [57]. Subgroup 2 for which we propose the name 'Candidatus Thalassonella' (Greek: θάλασσα, sea; MSI = 96.6%), contained 66 marine sponge-derived sequences, 52 of which were obtained from our sampling of deep-sea Geodia spp. The third genus-like subgroup in Clade 2 emerged with the addition of the new soilderived 16 S rRNA sequences generated in this study, and contained 112 sequences of exclusively terrestrial origin. Because the first soil sample that was positively tested for Tectomicrobia originated from a vegetable garden, we propose the name 'Candidatus Prasianella' (ancient Greek, πρασιά, garden bed; MSI = 95.3%). In addition to their distinctive phylogeny, 'Prasianella' and 'Thalassonella' 16 S rRNA gene sequences contained as a diagnostic feature a 25-29 bp insertion in the V3 region, which 'Entotheonella' sequences lacked (Fig. S4). Two additional smaller subgroups, 't1' and 't2', which contained 10 and 11 sequences of mainly terrestrial origin respectively, were also present in Clade 2 (Fig. S2). However, further sequences are necessary to resolve the phylogenetic relationship between 't1', 't2' and members of 'Thalassonella' and 'Prasianella'. Comparison of the groups within Clade 2 showed 85.4-90.3% MSI (Fig. 1B), suggesting that Clade 2 represents a candidate family, 'Entotheonellaceae', whereas Clades 1 and 2 had an MSI of 80.8%, suggesting membership in a shared class or phylum according to Yarza et al. thresholds [57]. Although our previous efforts to identify reservoirs of 'Tectomicrobia' had focused on marine sponges, the current analysis shows that members are widespread in soil. In total, 54.1% of the tectomicrobial sequences derive from soil habitats, 5.0% from freshwater, 1.4% from saltwater and 39.5% from marine sponges (Fig. 1C), although this distribution is likely heavily affected by sampling bias.   Fig. 1 Phylogeny and environmental distribution of the candidate phylum Tectomicrobia based on 16 S rRNA gene sequences. A Phylogenetic tree inferred from 456 tectomicrobial sequences under the neighbor-joining criterion. Support of individual branches by bootstrap values above 50% is indicated at the respective nodes for the neighbor-joining method (left) and the maximum-likelihood method (right). Scale bar, 0.01 changes per nucleotide position. The number of sequences comprising a specific group is shown inside or next to the corresponding clade. The phylum Nitrospinota was used as an outgroup. The environmental distribution of sequences assigned to specific groups is shown as a pie chart next to the corresponding group. The subgroups 't1' and 't2' , present in Clade 2, contained 10 and 11 sequences of mainly terrestrial origin respectively, but further data are needed to resolve their relationship with the rest of the Clade. B Median sequence identities within (main diagonal) and between (off-diagonal) the genus-level clades described in this study. A more detailed version of this figure, showing higher and lower-level divisions, is provided in Fig. S5. C Pie chart diagram illustrating the environmental distribution of 'Tectomicrobia' based on 456 16 S rRNA gene sequences. A list of sequences generated in this study can be found in Table S3. Relationship between phylogeny and environmental distribution within 'Tectomicrobia' lineages All 112 sequences within the genus 'Prasianella' originated from terrestrial sources, but no clear further differentiated relationship could be identified between phylogeny and geographic origin (Fig. S3). Similarly, the geographic distribution of 'Thalassonella' covered various marine regions including the Pacific, Atlantic, Arctic and Indian Oceans (Fig. 2). Among the 66 'Thalassonella' sequences, most originated from various Geodia spp. sponges (37 sequences), followed by the 16 sequences from the two Xestospongia species Xestospongia testudinaria (Manado, Indonesia) and Xestospongia muta (Key Largo, Florida, USA) [58]. The large majority of sponges with 'Thalassonella' are high-microbial abundance (HMA) sponges (Geodia spp., Xestospongia spp., Aplysina spp., Ecionemia alata, Ircinia strobilina, Vaceletia crypta, Plakortis halichondrioides), with one low-microbial abundance (LMA) sponge (Tedania ignis) and two species of more uncertain status (Astrosclera willeyana, Haliclona tubifera) [59][60][61]. 'Thalassonella' subclades showed a high frequency of shared sponge hosts (phylosymbiosis), particularly for the two sponge genera for which several 'Thalassonella' sequences were available, Geodia and Xestospongia (Fig. 2). The 'Thalassonella' group also contained a single sequence from a diseased coral (Montastraea faveolata) as the only non-sponge-derived representative. However, since a corresponding sequence was not reported from a nearby, healthy coral, it might originate from a contamination rather than the coral microbiome [62]. Although no geographic patterns were observed (Fig. 2), sequences from similar water depths clustered together, albeit with moderately high bootstrap support, as apparent from the existence of individual 'Thalassonella' Geodia subclades comprising sublittoral to upper bathyal (36-787 m) and lower bathyal members (688-1462 m), as well as a 'Thalassonella' Xestospongia subclade comprising sublittoral (36-200 m) members (Fig. 2). Most Geodia tested had only one ASV, which was confirmed by subcloning 12 specimens; the exceptions were G. barretti PC619 and PC738 from Norway, G. barretti PC958 from Davis Strait, and G. phlegraei PC12 from Norway, which had two ASVs (Table S3). Thus, water depth, or more likely water masses, along with the sponge host species, appeared to be major determinants influencing the distribution of 'Thalassonella'. The sharing of specific ASVs by different Geodia species over large geographical distances could reflect Geodia population connectivity [63]. For example, G. barretti and G. macandrewii sharing one ASV between the upper bathyal Svalbard and Davis Strait (~3700 km apart) may reflect host-microbe connections via the deep-sea East Greenland current, followed by the West Greenland current. Depth has been reported in several studies as a factor that stratifies marine bacterial communities at the species (e.g., Bacillus cereus [64]), phylum (e.g., SAR11 [65]), and global level [66]. However, more thorough sampling will be required to rule out potential confounding factors.
'Entotheonella' members exhibit two contrasting phylogenetic patterns Phylogenetic insights were of particular interest for members of 'Entotheonella' as rich producers of bioactive natural products in sponges [15,16,18,19,25,26]. In contrast to 'Thalassonella', only limited relationship was initially apparent between bacterial phylogeny and host affiliation (Fig. 3). Also, host sponges here were a mix of HMA (Theonella, Discodermia, Aplysina, Ircinia, Agelas, Pseudoceratina purpurea) and LMA (Cliona, Stylissa, Hexadella, Dysidea, Callyspongia) sponges [59,67]. However, closer analysis revealed two distinct distribution patterns among 'Entotheonella' phylotypes. All sequences corresponding to the previously described metabolically talented 'Entotheonella' phylotypes fell into one clade (group I in Fig. 3, MSI = 96.3% in Fig. S5) with relatively high sequence divergence and a branching topology that suggested some degree of specificity with sponge hosts. A contrasting pattern was found in groups IIa (MSI = 98.5%) and IIb (MSI = 99.4%) (Figs. 3 and S5), which harboured 'Entotheonella' sequences with highly similar 16 S rRNA genes that were detected exclusively by PCR in a wide range of sponges. These closely related 'Entotheonella' phylotypes are clearly not tied to the phylogeny of their hosts; they might either be host-promiscuous symbionts or non-symbiotic contaminants from seawater. In some sponge species, e.g., T. swinhoei or Discodermia dissoluta, the same sponge specimen was found to contain sequences from groups I and IIa/IIb. The final 'Entotheonella' clade (group IIc with low bootstrap support) showed slightly higher sequence divergence (MSI = 97.3%) than groups IIa and IIb. While full-length 16 S rRNA gene sequences might be needed to improve bootstrap values, it is noteworthy that the 'Entotheonella' lineage contains so many different phylotypes, both in absolute terms and relative to the phylum as a whole. Given the large number of BGCs in each 'Entotheonella' genome studied to date and the uniqueness of each BGC inventory [16,28], this suggests a wealth of biosynthetic novelty yet to be discovered.
Since group I included all biosynthetically talented 'Entotheonella' previously characterized by genomic, microscopic, and spectroscopic methods, we were intrigued by the presence of additional group I members from terrestrial habitats, including soil samples collected at two distant locations in Germany and samples derived from dump layers of the leafcutter ant Atta colombica. However, our attempts to use mechanical enrichment and CARD-FISH-based visualization methods [18] to further characterize 'Entotheonella' in positive soil samples were unsuccessful.
A potential 'Entotheonella' source of the anticancer polyketide psymberin in the sponge Psammocinia sp The data on group I containing all known symbiotic 'Entotheonella' variants raised the question of whether additional spongeassociated producers can be identified in this clade. A promising candidate was a sequence amplified from metagenomic DNA of the sponge Psammocinia sp. (order Dictyoceratida, family Irciniidae) with 96.1% identity to the 16 S rRNA gene of 'Entotheonella factor'. This sponge was previously shown to contain psymberin [68,69], a cytotoxin with nanomolar activity against a variety of tumor cell lines [69,70]. Psymberin belongs to the pederin-type family of polyketides that are produced by symbionts from diverse bacterial phyla [71], including onnamides and theopederins in T. swinhoei by 'E. factor' [15] and mycalamides in the sponge Mycale hentscheli by a bacterium of the gammaproteobacterial UBA10353 group [72]. From a metagenomic DNA library of Psammocinia sp., we had previously isolated the BGC for psymberin, but the producer could not be identified at the time [73]. However, reanalysis of the genes surrounding the isolated gene cluster showed high amino acid sequence identity (81-99%) to 'Entotheonella' proteins for three gene products ( Fig. S6B; Table S4), suggesting an 'Entotheonella' symbiont as the possible source of psymberin. Initial attempts to detect filamentous bacteria in Psammocinia sp. by the established mechanical enrichment technique were unsuccessful, but a modified protocol involving the addition of collagenases and EGTA released copious amounts of multicellular filaments from the sponge extracellular matrix. To further characterize these filaments, we performed catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) experiments using the previously reported 'Entotheonella'-specific probe ESP-219 on 30 µm sponge tissue slices [18]. This method resulted in selective labeling of the filamentous bacteria ( Fig. 4 and S6C), which we subsequently named 'Candidatus Entotheonella consociata' for its tight association with the sponge matrix (Latin, consociata; associated). The localization of these filaments to the inner pore surfaces of the sponge resembles previous findings in Theonella sponges [17,18]. Psammocinia representing the first reported non-astrophorin sponge genus containing 'Entotheonella' and even belonging to a different sponge subclass (Keratosa) underlines the importance of this taxon for the biology of sponges. Efforts to directly sequence DNA isolated from the enriched cell pellet were unsuccessful, a phenomenon that we have also encountered for other 'Entotheonella'-containing samples [16,18] but that might be overcome by single-bacterial sequencing [24].
Metagenome-assembled genomes of 'Ca. Thalassonella' from diverse sponges lack biosynthetic richness The lack of genomic information about 'Tectomicrobia' other than 'Entotheonella' is a self-reinforcing problem. Due to the lack of reference genomes against which to compare draft metagenomeassembled genomes (MAGs), these newly sequenced MAGs cannot be unambiguously assigned to the candidate phylum, or to the subordinate taxonomic groups introduced in this study but defined from 16 S rRNA gene relatedness. This can be a challenge for phylogenomic inference tools such as the GTDB [3], which assigns taxonomy based on genome similarity to this limited reference set. Several metagenomics papers have mentioned the presence of 'Tectomicrobia' in metagenomes without addressing how these sequences relate to the available 'Entotheonella' data [30][31][32][33]. Because of their conservation, 16 S rRNA gene sequences are rarely assembled successfully from metagenomes, preventing a direct comparison of these 'Tectomicrobia' MAGs with the 16 S rRNA gene-based phylogeny presented here.
Among the metagenomes of 17 sponges recently sequenced by our groups, the GTDB-Tk assigned 24 MAGs with completeness greater than 50% and no more than 5% contamination as belonging to 'Tectomicrobia' (Table S1). Crucially, BLAST searches found that 14 of these MAGs contained 16 S rRNA gene sequences with lengths ranging from 399 to 747 bp. These sequences were aligned to those used for phylogenetic analysis in this study. Five of these MAGs clustered with reference groups of 'Tectomicrobia', highlighting the challenge of phylogenomic assignment in a sparsely covered region of sequence space, while one diverged early from 'Entotheonellaceae' but could not be placed into a candidate genus, and the remaining nine fell clearly into the candidate genus 'Thalassonella' (Fig. 5A and Table S1). Although the process of trimming the alignments to accommodate these incomplete 16 S rRNA gene sequences removed some of the subgenus level resolution in the phylogenetic tree, the biogeographic signatures within 'Thalassonella' can still be distinguished in the condensed tree (Fig. S7), and the groups proposed in this study correctly distinguish Geodia-derived MAGs from those derived from non-astrophorin sponges.
Finally, a phylogenomic tree was constructed in Anvi'o using these 15 MAGs for which 16 S rRNA gene sequences were available, together with, three 'Entotheonella' MAGs, and three MAGs from the phylum Nitrospinota as references (Fig. 5B). This tree, based on a concatenated alignment of 71 bacterial marker genes, shows remarkable concordance with the 16 S rRNA genebased tree, supporting all assignments at subgenus levels within the phylum 'Tectomicrobia'. The only exception is Gb4_35, which had an inconclusive 'Entotheonellales' designation based on its 16 S rRNA gene, but which phylogenomics reveals to belong to the 'sublittoral to upper bathyal Geodia group within 'Thalassonella'. Incorporation of putative 'Tectomicrobia' bins lacking 16 S rRNA gene sequences identified an additional seven MAGs as 'Thalassonella', and rejected two additional MAGs as being incorrectly assigned to 'Tectomicrobia' (Fig. S8 and Table S1), but uncovered no genomes belonging to other candidate 'Tectomicrobia' genera, either 'Entotheonella' or the yet-elusive 'Allonella' or 'Prasianella'.
To assess the biosynthetic potential of 'Thalassonella', all candidate 'Tectomicrobia' genomes were analyzed using anti-SMASH. In contrast to the numerous and largely unique BGCs of 'Entotheonella' [15,16,28], the 'Thalassonella' MAGs possess just 2-5 BGCs each, nearly all of which are conserved within the genus (Figs. 6, S9-13). Of particular note are two putative terpene BGCs --one present in all 17 'Thalassonella' MAGs and the other present in 11 of the 17 --that are also found in the three published 'Entotheonella' draft genomes [15][16][17]. A putative BGC for a modified lipid is present in 15 of 17 'Thalassonella' MAGs, and a unimodular type I polyketide synthase is found exclusively but universally in the 'sublittoral to upper bathyal Geodia' subgenus grouping. An orthologous PKS was previously observed in the sequenced metagenome of the sponge Plakortis simplex and six additional sponges, and given the designation swf [74], but those studies did not suggest a bacterial source. The reported swf gene clusters show clear sequence and organizational homology but are more divergent from the 'Thalassonella' sequences of this study than they are from one another (Fig. S13). Neither the lipidmodifying cluster nor the swf locus are found in 'Entotheonella' genomes, and none of the four conserved 'Thalassonella' BGCs are found in the outgroup MAGs. Although the functions of these clusters are unknown, their hypothetical products could be useful molecular biomarkers for the presence of 'Tectomicrobia' generally, in the case of the terpenes, or 'Thalassonella' specifically, in the case of the other BGCs. Indeed, using the terpene synthase in sequence similarity searches retrieved two MAGs that became publicly available after the completion of this study and that were both classified as 'Tectomicrobia' members (Genbank accession VGLS01001192 and [75]).
'Ca. Thalassonella' members lead a similar lifestyle to 'Entotheonella' Functional characterization and metabolic pathway reconstruction of 8 medium-high-quality 'Thalassonella' MAGs revealed important clues about their primary metabolism. On average, the analyzed MAGs were 87% complete and 2% contaminated with a genome size of 3.6 Mbp and 63% GC content (Table S5). Comparative genomic analysis suggested that 'Thalassonella' has a facultative anaerobic, heterotrophic metabolism highly similar to that previously reported for 'Entotheonella' [17]. The presence of genes coding for components of the respiratory chain and for oxygen-tolerant enzymes indicate the capacity of 'Thalassonella' to use oxygen as the terminal electron acceptor. Various types of cytochrome c oxidases (low and high O 2 affinity) identified in the genomes support previous findings on the ability of sponge symbionts to survive under different oxygen concentrations prevailing in actively pumping sponges [76,77]. Similarly to 'Entotheonella', multiple genes encoding CoA-transferases of family III (e.g. formyl-CoA transferase, benzylsuccinate CoAtransferase) and putative pathways for pyruvate fermentation to acetoin and 2,3 butanediol hint at a facultative anaerobic metabolism [17]. 'Thalassonella' MAGs harbored an almost complete set of genes for glycolysis (Embden-Meyerhof-Parnas pathway), the tricarboxylic acid (TCA) cycle and the pentose phosphate pathway (non-oxidative phase) (Table S6). It should be noted that the incompleteness of certain metabolic pathways must be interpreted cautiously as it could be attributed to sequencing or binning artifacts. However, in some cases key enzymes involved in core metabolic pathways might have alternatives performing the same function. For example, 'Thalassonella' MAGs were missing one of the key enzymes of the TCA cycle, namely the NADP-dependent isocitrate dehydrogenase [EC: 1.1.1.42], which is found in most prokaryotes, including 'Entotheonella' [17]. Instead, they contained the eukaryotic-type NADdependent isocitrate dehydrogenase [EC:1.1.1. 41]. No autotrophic fixation pathways were identified in any of the genomes. A heterotrophic lifestyle was also reflected in the presence of various ATP-binding cassette (ABC) transporters for amino acids, oligosaccharides, lipopolysaccharides, and lipoproteins (Table S6).
Gene-based evidence suggests that 'Thalassonella' has the capacity of using both methanol and oxalate as energy sources following the same energy acquisition strategy as 'Entotheonella'. Methanol is considered a key fuel for marine microorganisms sourced from atmospheric deposition [78] and phytoplankton production [79]. In the case of oxalate, it has been often observed in marine aerosols [80,81] and in the form of calcium oxalate in sponges [78]. 'Thalassonella' was also predicted to produce several amino acids, cofactors, and vitamins (Table S6). This biosynthetic potential has been previously reported for several sponge symbiotic lineages [17,33,82]. Since sponges are not capable of synthesizing several of these compounds, they are thought to obtain them either via filter-feeding or from their associated A Squalene-phytoene synthase Thalassonella  Fig. 1A showing the placement of 15 putative 'Tectomicrobia' bins in red by alignment of recovered 16 S sequences. B Unrooted phylogenomic tree of putative 'Tectomicrobia' MAGs containing 16 S sequences based on concatenation of conserved single-copy genes, including for reference three 'Entotheonella' draft genomes ('E. factor' , NCBI ID AZHW01; 'E. gemina' , AZHX01; 'E. serta' , PPXO01) and three draft genomes from the adjacent bacterial phylum Nitrospinae (MHDL01, PCWG01, DCWK01). The MAGs Gb4_35 and Gb10_1 showed a genome completeness below 75% and were therefore not included in the analyses of Table S6. Gb4_35 had an inconclusive taxonomic affiliation based on its 16 S rRNA gene but was assigned to the 'Thalassonella' sublittoral to upper bathyal Geodia group through phylogenomics. microorganisms [83]. Our findings suggest that members of the candidate genus 'Thalassonella' carry many traits typically found in sponge symbionts. Further, 'Thalassonella' members appeared to follow a similar lifestyle to 'Entotheonella' which contradicts the large difference in their BGC potential.

CONCLUSIONS
In the present study, the phylogenetic analysis of 447 16 S rRNA gene sequences, including marine and terrestrial samples obtained by targeted sequence prospecting, demonstrates that the candidate phylum 'Tectomicrobia' is composed of a diverse range of phylotypes globally distributed among various habitats. The phylum can be divided into several clades, some of which can be preliminarily associated with environmental factors such as, water depth, or association with marine sponges. In contrast to 'Entotheonella', the first draft genomes from the candidate genus 'Thalassonella' do not suggest natural product richness, while the primary metabolism of both genera appears to be highly similar. However, it remains unknown whether 'Ca. Entotheonella' represents the only talented genus within 'Tectomicrobia' due to the lack of additional tectomicrobial genomes spanning the remaining clades.
The analysis of the 'Entotheonella' genus suggests the existence of a subgroup of 'Entotheonella' members in phylosymbiosis with sponge hosts and a larger group of bacterial phylotypes that are highly similar to each other and show no co-speciation with the host. All of the known bioactive natural product producers in 'Tectomicrobia' fall into the former group. This interconnection offers an interesting possibility for the prioritization of sponges harboring group I 'Entotheonella', and has already led to the discovery of the first potential chemically productive 'Entotheonella' species in a non-astrophorin sponge. The evidence suggests that this symbiont, from Psammocinia sp., is the producer of the cytotoxic compound psymberin. Surprisingly, our studies also point towards the existence of bacteria within the 'Entotheonella' superproducer group I that are not associated with marine sponges, but no identification was possible beyond the 16 S rRNA gene amplicon as enrichment experiments were not met with success. Further studies will therefore be needed to evaluate whether they are as chemically productive as their spongeassociated counterparts and can be cultivated more easily.

DATA AVAILABILITY
16S rRNA gene sequences are available from the NCBI Sequence Read Archive under accession number OL753566 -OL753644 and OM256467; 'Thalassonella' MAGs are available from the European Nucleotide Archive under accession number ERS12129423 -ERS12129445.